library("survival")
library("survminer")
## Loading required package: ggplot2
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
library("viridis")
## Loading required package: viridisLite
library("plotly")
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library("htmlwidgets")
library("GGally")
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ tibble 3.1.6 ✓ dplyr 1.0.8
## ✓ tidyr 1.2.0 ✓ stringr 1.4.0
## ✓ readr 2.1.2 ✓ forcats 0.5.1
## ✓ purrr 0.3.4
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks plotly::filter(), stats::filter()
## x dplyr::lag() masks stats::lag()
# Creating basic model:
surv1 <- survfit(Surv(time,status) ~ sex, data = lung)
# Getting data for plotly graph
p <- ggsurv(surv1, CI = T)
df <- data.frame(prob = p$data$surv, sex = p$data$group,
upper = p$data$up, lower = p$data$low, time = p$data$time)
df$sex <- ifelse(df$sex == 1, "Male", "Female")
df$sex <- as.factor(df$sex)
df_wider <- df %>%
pivot_wider(names_from = sex, values_from = prob)
df_wider$MaleUpper <- ifelse(!is.na(df_wider$Male), df_wider$upper, NA)
df_wider$MaleLower <- ifelse(!is.na(df_wider$Male), df_wider$lower, NA)
df_wider$FemaleUpper <- ifelse(!is.na(df_wider$Female), df_wider$upper, NA)
df_wider$FemaleLower <- ifelse(!is.na(df_wider$Female), df_wider$lower, NA)
# Plotly interactive graph:
p_i <- plot_ly() %>%
add_lines(x = df_wider$time, y = ~df_wider$MaleUpper, name = "Upper Bound",
line = list(width=.5, dash="dot", color = '#481567FF'), showlegend = FALSE
, hovertemplate = paste('%{y}')) %>%
add_lines(x = df_wider$time, y = ~df_wider$Male, name = "Male", line = list(color = '#481567FF')
, hovertemplate = paste('%{y}')) %>%
add_lines(x = df_wider$time, y = ~df_wider$MaleLower, name = "Lower Bound",
line = list(width=.5, dash="dot", color = '#481567FF'), showlegend = FALSE
, hovertemplate = paste('%{y}')) %>%
add_lines(x = df_wider$time, y = ~df_wider$FemaleUpper, name = "Upper Bound",
line = list(width=.5, dash="dot", color = '#DCE319FF'), showlegend = FALSE
, hovertemplate = paste('%{y}')) %>%
add_lines(x = df_wider$time, y = ~df_wider$Female, name = "Female", line = list(color = '#DCE319FF')
, hovertemplate = paste('%{y}')) %>%
add_lines(x = df_wider$time, y = ~df_wider$FemaleLower, name = "Lower Bound",
line = list(width=.5, dash="dot", color = '#DCE319FF'), showlegend = FALSE
, hovertemplate = paste('%{y}')) %>%
rangeslider() %>%
layout(hovermode = "x unified",
title = "Survival Curves of Men and Women With Lung Cancer",
xaxis = list(title = "Time (in days)"),
yaxis = list(title = "Survival Probability"))
p_i
This data source can be found in the “survival” r package and can be pulled from any machine just by calling this package in r. The description of the data and for each of the variables can be accessed via running “?lung” code. The description provided states: “Survival in patients with advanced lung cancer from the North Central Cancer Treatment Group. Performance scores rate how well the patient can perform usual daily activities.”
I want to show the difference of survival rates between males and females with 95% confidence bands. I want the user to easily see from a high level the difference between the two groups and to further be able to compare the probabilities per a specified time range. The addition of a time range and interaction with the graph more conveniently allows the user to zoom in to see where there is overlap and by what degree. The tooltip contains the upper and lower bounds of each group to provide the numeric values.
I first built a basic survival model to predict survival times by sex. I put this model in a ggsurv plot to grab the probabilities for each sex and each of the upper and lower confidence band values. Converting the ggsurv plot to a plotly graph did not allow the customization I needed in order to make the graph more readable, color coordinated, and tooltip friendly. I decided to directly use plotly to incorporate these features so I manipulated the data frame to be in wide format in order for plotly to be able to display everything I wanted. I created confidence bands for each of the sexes and made them dashed lines with smaller line widths. This way it easily distinguishes between the predicted lines and their respective confidence bands without getting too overwhelming. I also needed to have each line coordinated to a specific color in order to maintain consistency. Adding each of these lines cluttered the legend, so I added code to hide the confidence bands and to keep only the predicted lines. After adding in the hovermode function with “x” as the input, it was hard to interpret the graph. To fix this, I changed the input to “x unified” so that there would only be one tooltip, and then I reordered the upper and lower bounds of the predicted lines. Finally, I added a title and x and y axes labels to make the graph interpretable.